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ff^ Abstract 

o 

pvi This paper considers belief propagation algorithm over pair-wise graphical models to develop low 

^ complexity, iterative multiple-input multiple-output (MIMO) detectors. The pair-wise graphical model is 

-^ a bipartite graph where a pair of variable nodes are related by an observation node represented by the 

t^^ bivariate Gaussian function obtained by marginalizing the posterior joint probability density under the 

r— I Gaussian input assumption. Specifically, we consider two types of pair-wise models, the fully-connected 

!__{ and ring-type. The pair-wise graphs are sparse, compared to the conventional graphical model in [18], 

Y^ insofar as the number of edges connected to an observation node (edge degree) is only two. Consequently 

the computations are much easier than those of maximum likelihood (ML) detection, which are similar to 

►^ the belief propagation (BP) that is run over the fully connected bipartite graph. The link level performance 

^^ for non-Gaussian input is evaluated via simulations, and the results show the validity of the proposed 

<3\ algorithms. We also customize the algorithm with Gaussian input assumption to obtain the Gaussian BP 
^H 

' run over the two pair-wise graphical models and, for the ring-type, we prove its convergence in mean 
■^ 

^D to the linear minimum mean square error (MMSE) estimates. Since the maximum a posterior (MAP) 



estimator for Gaussian input is equivalent to the linear MMSE estimator, it shows the optimality, in 
mean, of the scheme for Gaussian input. 
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I. Introduction 

Recent work on multi-input and multi-output (MIMO) detections has mainly been focused on so-called 
sphere decoding |[T|-|[6j. Sphere decoding is a two-stage detector in which the channel matrix is first 
converted into an upper triangular form and, utilizing this structure, a tree search is used for joint data 
detection. Since the full tree search has the same complexity as maximum likelihood (ML) detection, a 
sort of reduced search algorithm is applied by limiting the search space, e.g., the number of candidate 
symbols or radius at each tree search stage. One advantage of sphere decoding is that it can, by choosing 
an appropriate value of radius or list size, provide a tradeoff between performance and complexity. The 
performance of sphere decoding has been shown to be quite close to that of ML with a reasonable level 
of complexity ||6|. To produce soft decisions required for channel decoding, however, the search space 
cannot be set too small. 

Another type of MIMO detector, which has received little attention, is the channel truncation approach 
fTt-pO}. This approach is also a two-stage detector, where the channel is first converted into a bi- 
diagonal or, more generally, a poly-diagonal form ||9j, |10| and, utilizing the effective channel structure. 



a trellis search, e.g., the Viterbi algorithm or the forward-backward algorithm pT| , p2| , is used for post- 
joint detection. The method is similar to the concatenated channel-shortening equalizer and maximum 



likelihood sequence estimator (MLSE) for the inter-symbol interference channel |13|. 



Another class of MIMO detection worthy of attention is graph based detection |14|-|20|. The ap- 



proaches are based on the belief propagation (BP) algorithm pT[ , |22|. This algorithm has also been 
extensively studied for the decoding of channel codes, such as the turbo codes and low density parity 
check codes. In these approaches, the MIMO channel is modeled as a fully-connected bipartite graph, 
which consists of multiple N observation nodes representing the received signal, multiple M variable 
nodes representing the hidden data, and the edges connecting the observation nodes with the variable 
nodes. The resulting graph has the maximal edge degree, i.e., every observation node is connected to every 



variable node. When applying the BP algorithm fTl] or the sum-product algorithm |22| to such graphs, 
the complexity is as high as the ML or MAP detector. This is mainly due to the metric computation and 
the marginalization operation required for the message update at the observation nodes. 



To reduce the computational complexity, the Gaussian BP has been considered in p6| and p7| , where 
the input data and messages are all assumed to be Gaussian so that the message and posterior probability 
can be represented by a pair of mean and variance, resulting in a very simple message update rule. As 
shown in [16] and [17], however, the algorithm converges (though not always) only to the linear minimum 
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mean squared error (LMMSE) solution, which is inferior to the ML detector for non-Gaussian input. On 



the other hand, p8| and pO| studied complexity reduction via model simplification. Particularly, in | jl8| , 
it was suggested that to reduce the edge degree some edges in the fully connected bipartite graph should 
be pruned based on the strength of the channel coefficients. By doing so, not only is the number of 
messages reduced, but also the marginalization operation on observation nodes can be performed at a 
much lesser cost. Reduction in the marginalization cost is exponential with the edge-degree reduction, 
resulting in far less complexity than ML. The problem here, however, is that the performance loss can 
be more severe with the edge-degree reduction. 



Other interesting graph-based approaches are those in |20|, |23|-|25|, based on the pair-wise Markov 



random field (MRF) |26|. In MRF, we have only one type of node representing the hidden data and 
the edges reflecting the local dependency among them. The local dependency is represented by potential 
functions and, specifically in pair-wise MRF, they are functions of one or two variables. In fact, as noticed 



in |20|, | |23| and |25| (also in [16] and |19|), a multivariate Gaussian function can be decomposed into 
a product of functions of one or two variables resulting in a fully connected pair-wise MRF. On the 
other hand, in | |20| , noticing that BP may not work well for a loopy graph, the authors proposed a tree 
approximation on the basis of KuUback-Leibler distance (KLD) optimality criterion. In |J24|, the same 
authors have proposed using the potential functions obtained by two dimensional projection and the 
expectation-maximization (EM) algorithm based post detection. Bit-based probabilistic data association 



is another approach to low complexity MIMO detection especially for higher order QAM. In |27|, a 
matrix representation is introduced to represents symbol mapping, by which it can be considered as 
a linear processing and can be combined as part of MIMO channel giving us a room for complexity 
reduction for higher order QAM. 

In this paper, we investigate a similar approach to the pair-wise MRF based MIMO detector, but with 
different formulation, i.e., instead of using the potential functions obtained from the direct decomposition 



of multivariate Gaussian function |20|, |24| or from the two dimensional projection in | |24| , we propose 



using the functions obtained by marginalizing the posterior joint probability density under the Gaussian 



input assumption. As implicated in |23|, |25|, the corresponding bipartite graph has an edge degree of 
only two and the proposed scheme has much less complexity than that of ML/MAP In addition to the 
fully connected pair-wise graph, we also consider the ring-type pair-wise graph. The proposed scheme 



can be regarded as an edge pruning technique, similar to the one in |18]. Unlike that of |18|, however, 
the pruning is performed by a linear transformation and the performance degradation compared to the 
ML/MAP detector is shown to be reasonable with an edge degree of two. 
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This paper is organized as follows. In the next section, we briefly review the ML/MAP and the 



graph-based approach to MIMO detection. In Section III the proposed iterative detection algorithm is 
presented based on the fully-connected and ring-type pair-wise models, respectively, for non-Gaussian 
input. In Section IV, we customize the proposed algorithms under Gaussian input assumption (Gaussian 
BP), and discuss its convergence property. The performance is extensively evaluated and compared via 
link-level simulations in Section V and, finally, in Section VI, the concluding remarks are given. 

II. System model, MAP and Graph-based Detection 
System Model: A Gaussian MIMO system with an A^ x M channel matrix H{N > M) is modeled as 

M 

y = Hx + n = ^ hkXk + n 
fc=i 

where z is an M x 1 transmitted data symbol vector, n is an A^ x 1 noise vector, y is an A^ x 1 received 
signal vector and hm is the rn,th column of H. A symbol n is assumed to be complex Gaussian with 
mean and covariance IE [nn^] = a"^! and the transmitted data symbol vector x is assumed to have mean 
and covariance matrix E[a;a;^] = /, where E(-) denotes expectation. In practice, each element of x is 
usually a 2™-ary symbol drawn from a finite alphabet set H of size 2™ such as QPSK and 16-QAM. 

MAP detection: The maximum a posteriori (MAP) detector selects x that maximizes the a posteriori 
likelihood 

I . p{y\x)p{x) 

^^"^'^^ = P{y) ^^ 

where 

p{y\x)=CM{y;Hx,a''l) (2) 

M 

with CJ\f{y;iJ,,C) representing a multivariate complex Gaussian probability density function (PDF) of 
mean /i and covariance C defined as 

CAr(y;/x,C) = — ^exp (^-^(y -/.)^C-^(y -/x; 

where the superscript H denotes Hermitian transpose. The search space of the MAP is an M-dimensional 
space, H*^, and the complexity is 0(2™^^). When using concatenated channel coding and MIMO, a 
MIMO detector is required to produce soft-decision values, i.e., log-likelihood ratio (LLR). Denoting the 
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jth data symbol as Xj{bji, bj2, • • • , bjm), where bj^k is the A;th bit contained in Xj. Then, LLR of bjk can 
be obtained by first marginaUzing p{x\y) over x\xj = {xi,X2,- ■ ■ , Xj-i,Xj+i, • • • , xm) to get 

p{xj = x\y) = ^ p{xi,X2,--- ,XM\y) 

(3) 
= A- ^ p{y\xi,X2,--- ,Xj = X,--- ,XM)Y[pi^k) 

where A = p^^{y) is a normaUzing constant, and xfs are assumed to be independent of each other. In Oil, 
p{xj) is the a priori probability of Xj, which is assumed to be uniformly distributed, i.e., p(xj) = 1/2™ 
for a modulation size of 2™. The LLR for each bit is then computed as 



LLR(6,,,) = log Z^'' - 7 = log ^-"---''--=^ ," , . (4) 



The prohibitive complexity, when m and M are large, comes from the marginalization operation in ([3]). 
Graph-based detection (BP over fully-connected bipartite graph): The MAP detections in Q is useful 



for turbo equalization |28|, |29|, where one can find a vast amount of literature showing the validity of 
iterative MIMO detection and channel decoding. Although turbo equalization is not our main focus in 
this paper, it is worthy of paying attention to the iterative detection as shown in |18], i.e., the BP over 
the fully-connected bipartite graph. In fact, the MAP detection in (|3]l can be regarded as a BP that is run 
over the singly connected factor graph as shown in Fig. 1(a), where each variable node, representing a 
data symbol, first passes a priori information to the observation node labeled by the received vector, y. 
The observation node then provides each variable node with the corresponding a posteriori likelihood 
by computing the marginalization in ([3]). Since the graph is singly connected and all variable nodes are 
connected via one observation node, the BP over this graph will surely converge, in one iteration, to the 



correct a posteriori probability. The graph-based detection in |18|, on the other hand, is a BP over the 
fully connected bipartite graph as shown in Fig. 1(b), where the marginalization is performed separately 
for each observation node and they are then combined to produce the belief and the extrinsic information 
on each data symbol. The algorithm in | ,18J can be summarized as follows. 

BP 1 over the fully-connected factor graph fT8| 

For given a priori probability of Xj , which is typically assumed to be uniformly 
distributed, i.e., p{xj) = 1/2"^ for a modulation size of 2™ 
(1) Initialization: 

Xj^iixj) =pixj). 
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[2) Observation node computation: 



■Ki^j{xj) = A- ^ p{yi\xi,X2,--- iXM)Y\^^k^i{xk)- (5) 



(3) Belief update: 

M 



Kxj) = Y[ T^k^jiXj). (6) 



k=l 

(4) Variable node computation: 



b{xj 



Xj^i {xj ) = Yl T^k^j {xj ) = f-^ . (7) 

The message update ([s]) - ([7]) are repeated by a pre-defined number or until the 
belief does not change any more. 

Note that, in dsll, p{yi\xi,X2, .., Xj = x, ..., xm) is given by 

p{yi\xi,X2.,..,XM) =CN lyi;2_^,^^hijXj,a 1 
and, by combining (|5]l and (|6]), we see that, at the first iteration 

-p-pA/ 

b{xj) (X [[^^^p{xj\yk) 

which is certainly different from p{xj\y) in ([3]). That is, in BP algorithm, we first marginalize p{x\yk) 
for each received signal y^ to obtain p{xj\yk) and, then, the belief is obtained by their product, while, 
in MAP, we just marginalize p{x\y), once and for all. Note also that since the marginalization in ^ is 
performed over M — 1 dimensional space and must be performed for the total number 2*" states of Xj, 
the complexity for one iteration is the same as that of MAP and the total complexity is multiplied by 
the number of iteration resulting in far complex computation than that of MAP detection. Regardless of 
its complexity, however, it provides a base structure for the development of low complexity detector. 

Complexity Reduction via Edge Pruning: To reduce the computational burden of the marginalization 
in (|5]) for non-Gaussian input, [18 J proposed pruning some edges of which the corresponding variable 
and observation nodes are weakly coupled together, e.g., those variable-observation node pairs with small 
values of \hjk\. By using only (edge degree) d{ < M edges per observation node (i.e., pruning M — d{ 
edges), the complexity is reduced by a factor of i/2™(^^-'^i) relative to the ML/MAP or the BP 1 of 
complexity 0(2"^*^). Here df is the edge degree. The problem with this scheme is that d{ must be large 



enough to ensure a reasonable performance, as shown in |18|. 
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III. Detection Algorithm based on Pair- wise Graphical Models 

In this section, we develop low complexity iterative MIMO detection algorithms based on the pair-wise 
graphical models. We consider two types, namely, the fully-connected and the ring-type, and derive the 
corresponding BP algorithms that work for non-Gaussian input. As will be shown below, BP over the 



ring-type pair-wise graph is, with a slight difference, effectively equivalent to the one in |10|. 



A. BP based on pair-wise Markov Random Field 

Our starting point is the BP algorithm based on pair-wise Markov random field (MRF) in ||T9|, |20| 



and |25 1. MRF is an undirected graph that describes local dependencies among a set of random variables. 
In MRF, the joint PDF of all random variables involved can be represented by a product of the joint PDF 
of each cliquej]] The pair-wise MRF means that a joint PDF (of all variables involved) is represented by a 
product of joint PDFs with only two variables corresponding to an edge connecting any two neighbors. Let 
V = {1,2,- ■ ■ , M} be the set of nodes in the MRF corresponding to the random variables xi, X2, • • • , xm, 
respectively, and let E be the set of all edges connecting these nodes. For a compact expression, we 
also denote the edge connecting nodes j and k as e{j, k) and the set of neighbors of the jth node as 
V{j). In pair-wise MRFs, the a posteriori joint function p{xi,X2, • • • ,XM\y) is modeled by a product 
of pair-wise potential functions p7| , | [26| , e.g., 

p{xi,X2,--- ■,XM\y)= A-Y\^'ilji{Xi) JJ (j)ij{xi,Xj), (8) 

ieV {i,j):e{i,j)eE 

where Tp{xi) is self-potential assigned to each node and (p{xi,Xj) is the edge potential assigned to each 
edge. Such modeling based on a pair-wise MRF can also facilitate the marginalization to finally obtain 
the marginal distribution for each random variable. Denoting the (incoming) message from the ith to the 



jth node as Tii^j{xj), the BP through the pair-wise MRF can be described as 1 17| 



TJ:i^j{xj) = Q y^ il)i{xi)4)ij{xi,Xj) JJ -nk-,i{xi) (9) 

where, a is the normalizing constant, V{i)\j is the set of neighbors of node i excluding node j. Note that 



we follow the convention in p7| , p9| , and pO| to describe the message passing over a MRF, where only 
one type of node, say the variable nodes, exist and the message flies between these variable nodes. When 
we use a bipartite graph as shown in Fig. 1(b), we need to define two types of messages, i.e., one from 
variable node to observation node and the other from observation node to variable node, which can be 

'a clique in a graph is defined by a set of nodes having full-connection to each other. 
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easily obtained by dividing d9]) into two separate steps, i.e., Xi^j{xi) = YlkeN(i)\j '^k^ii^i) (variable-to- 
observation node message) and ■Ki^j{xj) = a X^x es '4^i{xi)4>ij{xi, Xj) ■ Xi^j{xi) (observation-to- variable 
node message). 

In (J9J), the incoming messages are combined first to produce the extrinsic information, nA;ev(i)\7 ^k^ii^i)^ 
and they are then "translated" by the potential function, ipi{xi)(pij{xi,Xj). The belief on the variable, Xj, 
is given by 

Kxj) = Yl T^k^j{Xj). (10) 

k&V{i) 

The potential functions in Q is given by a fatorization of the joint a posteriori probability. Specifically, 
in p7| , |27|, the potential function is obtained by decomposition of multivariate Gaussian function, i.e., 

4>ij{xi, Xj) = Aij exp ( ^Re[x*RijXj] j 



1 

a 



'4)i{xi) = Ajexp ( -—^RQ[x*y'j - Rii\xi\'^' 



(11) 



where Rij = h^ hj and y' = h^ y, and * denotes complex conjugate. In fact, such decomposition gives 
us a fully connected pair-wise MRF and is exact in the sense that ^ with the functions in ( [TT] ) is exactly 



the same as the joint Gaussian PDF. It has been shown in 1 16| and 1 17 1 that, with (111, the BP over the 
fully connected pair-wise MRF results in the MMSE solution if it converges (though the convergence 
is not always guarateed for arbitrary channel matrices). Most of all, however, it does not work well for 
non-Gaussian input and the performance is shown to be inferior to the ML/MAP detector, especially for 
higher order modulation. 

B. The proposed BP algorithm over pair-wise graphical models 
In this paper, we propose using the following message passing rule. 

TTj^jixj) = g y^ p{xj\xi,y) Y]_ -n-k^iixi). (12) 

Xi£=. k£V{i)\j 

where p{xj\xi,y) is the conditional a posteriori probability derived under a Gaussian input assumption 
to be discussed shortly. Comparing with (J9]), the potential function in (J9]) is replaced with p{xj\xi,y). 
Note, however, that it is not a factor of the a posteriori probability in ([T]l, unlike those in ([TT]). 

The trick here is to use p{xj\xi,y) obtained under Gaussian input assumption in order to approximate 
the marginal PDF of non-Gaussian data. Note also that although the translation function p{xj\xi,y) is 
obtained under the Gaussian assumption on the data symbol, the message itself, 7ri^j{xj), will not be 
treated as Gaussian. The rationale of using p{xj\xi,y) is to reduce the computational complexity. Let 
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p{xj\xi,y) be the true conditional a posteriori probability without the Gaussian assumption. Further 
assume that, after many iterations, the extrinsic information Y[k&v(i)\i '^k^i(^i) ^^^ '■^^ ^^^ node (a 
neighbor of the jth node) converges to its true a posteriori marginal distribution, p{xi\y). Then, with an 
appropriate normalizing constant, we also have 7ri^j{xj) — )■ p{xj\y) for the jth node, which means, once 
converged, this translation function ensures that the final belief is given by the true marginal a posteriori 
distribution. This is actually a non-sense since, before we run the algorithm, we need first to compute 
p{xj\xi,y), which, however, has a complexity of ML detection. Hence, at this step, we assume XjS are 
all Gaussian to obtain p{xj\xi,y), of which the computation is much simpler as to be discussed shortly. 
It is a simple trick to use p{xj\xi,y) obtained under Gaussian input assumption to approximate the true 
posterior marginal for non-Gaussian input (i.e., p{xj\y)). 

On the other hand, the conditional PDF, p{xj\xi,y), under Gaussian input assumption can be easily 
obtained from the following simple probability relations, i.e., 

p{xi,Xj\y)p{y) =p{y\xi,Xj)p{xi,Xj) = p{xj\xi,y)p{xi,y) 



resulting in 



where 



-/ I N Pixi,Xj\y) p{y\xi,Xj)p{xj) 

p{xj \xi,y) = \ = ,. |\ (13) 

p{xi\y) Piy\xi) 



p{y\xi, xj) = CM {y;hiXi + hjXj,Kij^iy) (14) 

p{y\xi) = CM {y;hiXi,K{iy) (15) 

p{xi)= CAT {xi; 0,1) 



with 



K^ = a'l + J2,^^h,h^ (16) 

for <1> = {ij} or {/}. In the second equality in ( p3| ), we used the independence assumptions on x/s. 

Moreover, the Gaussian input assumption leads us to a much simpler form. First, define the conditional 
MMSE estimator for Xj given Xj, 

and v'f = c^-v such that 

y'j\i = c%V = aj\i,jXj + aj\i^iXi + n'j^^ (18) 

DRAFT April 9, 2013 



PAIR- WISE MARKOV RANDOM FIELDS 



where 



aj\i,k = cf\ihk = h^K^l-^hk fork = ioTj 



Eln 






Then, ( [T3| ) can be rewritten as 



p{v'j\i\xi,Xj)p{xj) 



2 



p(xj|xi,y^-| 



P(yj-|j2; 



with 



P{y'j\i \xi,Xj ) = CM{y'j^-; ajiijXj + aj\i^iXi, o-J| •) 
Piv'jli \xi) = CN{y'j\,]aj\i^iXi,a'j\i + |aj|ijP). 



(19) 
(20) 

(21) 

(22) 
(23) 



In (23 1, we used j)(xj) = CAA(xj;0, 1). Plugging (22i and (23 1 into (21 1 and by replacing p{xj) with 



CJ\f{xj; 0, 1), we have the simplified translation function from the derivation in the appendix. 



p{Xj 



Xi,y 



CM 



j\i' 



Xj-, 



CM 



a 






'j\hj 



1 + ^2 



V 1 I I 

^j\i,iXi I ; 



12 \dj\i ^^JliA-^i) ' ^2 _^ 1^.,. .12 



y'j\i 



a 



i\i 



^j\i "•" \"'j\hi\ 






(24) 



where, in the last line, we used the fact that Ojuj is real valued and is equal to a ,-. Note that in (24i, 
the mean is the conditional MMSE estimate of Xj given Xj. 



Using (17 1 to (24 1, the proposed message passing rule can be summarized as follows. 



BP 2 over the fully-connected pair-wise graph 

Given the messages in the previous iteration, '!Tk^i{xi), 

(!) Compute the extrinsic information for all pairs {i,j) with i^j 

Xi^j{xi) = Yl '^k-^iixi)- 
k&V(i)\j 

(2) Translate the message Xi^j{xj) to ■ni^j{xj) 

TTi^jixj) = a ^ p{xj\xi,yj^i) • K^jixi). 



(25) 



(26) 



with p{xj\xi,y'-ij^) given by (24) . The above message passing is computed for all edges 
in both directions, and they are repeated by a pre-defined number or until the 
messages do not change any more. The belief is finally obtained the same as that 
in dlOl) . 
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Note that the above algorithm uses two types of message and can be efficiently described by a message 
passing over a bipartite graph in Fig. |2ja), where the observations used for the message translation from 
the^'th node to the ith and its reverse is clearly denoted by y'-,- and y'^,-, respectively. It is also interesting 
to note that the above algorithm is similar to the algorithms in pl and pO| with two differences. One is 
in the underlying structure and the other in message translation. To clarify the similarity and difference, 
we consider the ring-type bipartite graph shown in Fig. 2(b). In this ring-type graph, each node has only 
two neighbors and, hence, in the computation of extrinsic information, the incoming message from one 
neighbor is simply passed to the other and the detection algorithm can be described more concisely and 
clearly as follows (even though it can be generally applicable to any pair-wise graphical model). 

BP 3 over the ring-type pair-wise grapii (Forward-backward recursion) 

Given the messages in the previous iteration, TTk^i{xi), 

(1) Variable node to observation node message 

\-^(j±i)M(a^i) =^(iTi)M^i(^j) ^J- (27) 

(2) Observation node to variable node message 

TTj^O-ii),,, {x(j±i)J = Yl ^(^(i±i)M l^i' y(i±i)M Ij) • ^i^(i±i)M (^i) ^•?'- (28) 



with p{xj\xi,y'-,_^) given by (24) . After a pre-defined number of iterations, the belief 
is finally obtained by 

b{xj) = 7r(j-+i),,^j(xj) • 7r(j-_i),,_,j-(xj). (29) 



From (27l to (29), {■)m denotes the 1-base modulo-M operation such that {M+1)m = 1 and (0)j\/ = M. 
Later on, however, we will omit this for notational simplicity. 

On the other hand, this message update rule is a forward-backward algorithm similar to those in Q, 
i.e., the message from the {j — l)th node to the jth node corresponds to the forward message, and the 
one from the {j + l)th node to the jth node corresponds to the backward message. The difference is 



in the message translation. In (28 1, the message translation from the jth node to the ith and its reverse 



utilize a different translation function, i.e., 

yj\i + yi\3 ^ ^(^il^- y^'K) = p{y'..\x,) ^ V{xi\xj,y^\^). 

This means the branch metrics used for the forward and backward recursion are separately optimized to 
maximize their conditional SINR, as also proposed in tlOJ. The translation function is also different from 
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the branch metric in 1 10|, i.e., the mean and variance in (24) have a scaling factor of a*,- /(<^^u + |fij|i,jP) 
and 1/('7L + \0'j\i,j\'^)' respectively, instead of aV ,/|aj|j,jP and l/|aj|jjp, though it has a minor impact 
on the error rate performances. The bipartite graphs corresponding to this algorithm is shown in Fig. [2jb), 
where the observation used for the message translation from the 7th node to the ith and its reverse is clearly 
denoted by y[.,. and y'.,., respectively. Note that, for ring-type graph, we obtain different performance 
with a different antenna permutation, as also noted in ||7|, while, in the fully-connected one, we do not 
need antenna permutation, which is one possible advantage of the latter to the former. 

Since the graphical models have short cycle(s) (especially the fully-connected pair-wise graph), it is 
quite questionable whether or not BP 2 and 3 will converge. In the literature, it was known that the 
convergence of BP over a loopy graph is not guaranteed, even though it does converge in most practical 
cases. Since the convergence proof for non-Gaussian input is not tractable, we will tackle this question 
in the next section by modifying them for Gaussian input. 

C. Complexity 

For complexity comparisons, we need to consider both the linear preprocessing and the post iterative 
detections. Consider first the computational complexity of the post iterative detection only. In the MAP 
detector, the distance metric \y — Hxl"^ is computed first for all combinations of (xi, 0:2, • • • , xm) £ ^^ 
and, then, the marginalization in (BJ) is performed over all combinations of x\xj G H*^~^ for each of 2™ 
alphabet, resulting in a complexity of 0(M^ • 2"^*^). Comparing with the complexity of MAP detector, 
the computational burden in the BP2 for the fully-connected pair-wise graph in Fig. ^a.) for u iterations 
is 0{v • M{M — 1) • 2^*") since the marginalization for each M node is performed separately for its 
(M — 1) neighbors and repeated v times. Although some additional computation is required for the linear 



processing in ( 16 1-(20 1, it is typically much smaller than 2'"(^~^), resulting in considerable computational 
reduction, which certainly comes from modeling through the pair-wise graphical model. On the other 
hand, the computational complexity for the ring-type pair-wise graph in Fig. pTb) is 0{v ■ M ■ 2^*"), 
which is even less than that of the fully-connected one. 
To evaluate approximate number of operations, we assume: 



I) The marginalization in ([3]) for the MAP and the computation in (26 1 and (28 1 for the BP 2 and 3, 



respectively, are performed in log-domain, where multiplications and additions in these equations 
are replaced with addition and max-operation, respectively, and, in ([2]) and (24), we only need to 
compute its exponent. 
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2) A multiplication of a (p x q) matrix with a (g x r) matrix requires pqr times of multiplications and 
additions (of complex numbers). 

3) An inversion of a (p x p) square matrix approximately requires 2p^ — 2p^ times of additions, 2p'^ —p^ 
times of multiplications and p^ times of divisions (of complex numbers). 

4) Division of complex numbers requires one complex multiplication and two real divisions. 

5) A complex addition requires two real additions and complex multiplication requires four real mul- 
tiplications and two real additions. 

6) Real addition and multiplication are assumed to have the same complexity of one (operation), while 
real division to have 8 (operations). 

With these assumptions, we can count the number of operations required to generate the symbol likeli- 
hoods, i.e., the a posteriori likelihood in Q for the MAP and the final beliefs in the BP2 and BP3. We 
do not count the generation of LLR for each bit from the symbol likelihood since it is the same for all 
detectors. The results are summarized in Table. 1, where we also show two examples, one with M = 6, 
m = 2, i/i = 4, 1^2 = 6 and the other with M = 4, m = 4, i^i = 4, 1^2 = 6, where vi and V2 are the 
number of iterations for the BP2 and BP3, respectively. 



It will be interesting to compare the complexity of the proposed schemes with the one in 1 27 1 (Table 



I). As analyzed for BP2 and BP3, the complexity can be considered separately for the preprocessing and 



the post decoding. For the latter, the complexity of the one in |[27| should be the same as that of the 
BP2, though it would be more complex than that of the BP3 in our proposal. The main difference is in 
the preprocessing stage. Certainly, the complexity of the preprocessing in [27] is much less than that of 
the proposed preprocessing since it consists of only two matrix multiplications, i.e., H H and H^r, 
which requires M^ + M^ of complex multiplications and the same number of complex additions. 

IV. Message Passing with Gaussian Input 
In Section III, we developed BP algorithms run over the pair-wise bipartite graphs for non-Gaussian 



messages. The Gaussian assumption on x/s was employed first to obtain thetranslation function in (24i. 
While, we used the exact marginalization in the message translation step. In this section, we further 
simplify the message passing rule by extending the Gaussian assumption to the message translation 



step, as was done in |16|, |17|, |19|, to obtain the Gaussian BP over the two graphical models under 
consideration. 

ML detection with Gaussian input: With independent and identically distributed Gaussian input, p{x) = 
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HfLi CM{xj; 0, 1), the MAP detector in (jsjl becomes 

p{xj\y) = A- I ■■■ I CJ^{y;Hx,a^)YlCAf{x;0,l)dx\xj =CJ\f{xj;hfK~^y,l-hfK-^hj) qq) 

where we appropriately select a normaUzation constant A, while the covariance matrix K, is given by 



K = {HH +0"^/). Noting that, in (30 1, the mean is the linear MMSE estimates of Xj and the variance 
is the corresponding minimum MSE, i.e., 

Xj = hfK-^y (31) 

MMSEj = 1 - hfK^^hj. (32) 

This means that linear MMSE estimation is optimum for the Gaussian input, while it does not hold for 
non-Gaussian input. 

A. Gaussian BP over the proposed pair-wise graphs 

Assuming that Xj'& are Gaussian and the distributions TTi^j{xj), and b{xi) are all Gaussian PDFs, they 
can be characterized by their mean and variance only. This means the messages 7:i^j{xj) and the belief, 
b{xi), in the BP 2 and 3 can be replaced with the update rule for the mean and variance pair. Since the 
Gaussian BP corresponding to the BP 1 over the fully connected pair-wise graph in Q-Q has already 
been discussed in p^ , we consider here only the BP 2 and 3 over the two pair-wise graphical models. 

Let us denote the mean and the variance pair of the complex Gaussian PDFs, iTi^j{xj), and b{xi) as 
(/x^^j_5.j, a^ j_^ ) and (/ij, af). Then, the BP 2 and 3 under the Gaussian input assumption can be rewritten 
as follows (Detailed derivations are shown in the appendix): 

Gaussian BP 2G over the fully-connected pair-wise graph 

Given the messages in the previous iteration (or the initial messages), 7ri^j{xi) - 
{fij^^i^jja"^ ^_^j) \f{i,j):i^j, they are recursively updated by 



a. 



I 1 2 

2 _ 1 \^j\i,i 






/ v^ —2 

_ yj\i ^j\hi l^k&V(;i)\j '^Tr,k-^if^Tv,k^i ,^.. 

fJ'TT,i^j — , 2 1 I ^2 ■ v^ =2 • '--^^ 

"^ j\i "^ j\i l^keV(i)\j ^7r,k^i 

After a number of iterations of the above, the final belief on Xi is obtained by 

E-2 

/"i = ^^^^ Z2 • (36) 
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Gaussian BP 3G over the ring-type IVIRF (Gaussian forward-backward recursion) 

Given the messages in the previous iteration, 7ri_j.i±i(xi) = (/j,^_i_>.i-|-i, a^ j_^j_l_]^) V«, they 
are recursively updated by 

I 1 2 

2 _ 1 , \(^i±l\i,i\ -2 (r.r^. 

<^n,^^i±l 1 + ^2 + (1 + ^2 )2 • ^n,iTl~^i ^^ ' ) 

t±l\t ^ i±l\i-' 

1 f Q-JztllJji 

/^7r,j^j±l = . 2 ^iitlli ~ 1 I ^2' ■ /^7r,i=Fl-S>J- ^^^^ 

-^ ^ j±l|j ^ "^ J±l|j 

After a number of iterations of the above, the final belief on Xi is obtained by 

f^*"^ = (^n^i+l^i + ^^j-l^i (39) 

-2 _|_ -2 

/Xi = 32 —^2 ■ (40) 

7r,j+l->i "■" 7r,i— 1— >-i 



Particularly, in the Gaussian BP 3G, we observe the following: 
1) The variance and mean are updated separately (except in the final belief). 



2) In (37 1 and (38 1, there are two separate message flows; one is the forward from / to i+l and the 



other is the backward from / to /-I. 
3) Eq. ( [38] ) can be rewritten as 

Forward recursion: /i^,j^i+i = Fj o ^^^_^^^ (41) 

Backward recursion: iJ.n,i^i-i = Bi o n^^i^i^i (42) 

where the operations, Fi and Bi, are first order elementary function defined as 

Fio n = Mi+i,j + Vi+i^i ■ ^ (43) 

BiO n = Ui^i^i + Vi-i^i ■ /i (44) 

with 

u, . = -^ = ^^""f^'f = hfK7\y (45) 

- - ~"^l^'^ - ~^^^^^^^' - hfK7\K (46) 






Here, we used (17l-(20l and, in the last, the matrix inversion lemma 



{A + BB^)-^ = A-^ - A-^B{I + B^A-^B)-^B"A-\ 
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4) Similar to the means, ( [37] ) can also be rewritten as 

Forward recursion: a^ ._^.^^ = F- o al.^_ 



Backward recursion: a'^^^i_i 



^^°<m- 



where 






with 



w' 



'" 1+4 1+^f^ii}^. 



^j,i 



'^j\i,i\ 



2 \2 



(! + -?!.) 



hfKjh.hi 



1 + ^f^fe}^. 



/.fi^^/^/^, 



(47) 
(48) 

(49) 
(50) 

(51) 
(52) 



B. Convergence of Gaussian BP 



Regarding the convergence of Gaussian BP, it was previously shown in |30| that Gaussian BP for 



arbitrary topology converges to the correct mean (see also pT|). It was shown in p6| that the Gaussian 
BP over the factor graph in Fig. 1(b) converges to the linear MMSE solution, even though its convergence 
is not assured. Based on these findings, we can conjecture that, for both the Gaussian BP of rules 2G and 
3G, the mean converges to the linear MMSE solution, as also verified by simulations in the next section. 
One way to prove the convergence would be to use the idea of the "unwrapped tree" presented in 1301. 
In our case, however, this would be a tedious derivation. Therefore, we try an alternative approach that 
works for GBP 3G, but not for GBP 2G. Note, however, that the derivation here differs from | [T6l , i fTTl 
in the underlying graphical model and the translation function used. The objective in this subsection is 
to prove the following theorem. 

Theorem 1: In the Gaussian BP 3G over the ring-type pair-wise graph, the mean converges to the 



linear MMSE estimate ( 3 1 1 for non-zero noise power as the number of iterations approaches infinity. 
The proof is based on the following Lemmas. 
Lemma 2: For an arbitrary initial value /i(0), both the forward and backward recursions for the mean 



in (38 1 converge respectively to a unique, fixed point. 

Proof: Define one iteration as one complete turn of a message passing along the ring and consider, 
without loss of generality, the message at Node 1. Based on observations 1) through 3) in the previous 
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subsection, we obtain the recursive relations for Node 1, i.e., using an arbitrary initial value /u(0), we 
have 

IJ'7T,i^2{n) = {Fm o-'-F^o F20 Fio) /i^,i^2("- - 1) 

={Fm 0...F30F20 Fio)">(0) 
Hn,l^Min) = (^2 o 5,3 o • • • Bm o Bio) iiT,^i^M{n - 1) 
= {B2oB^o...BMoBiof^i{Q) 
where n is the iteration number and the collective operations for one iteration of the forward/backward 
recursion are given, respectively, by 

Fi,T ° /i =Fm o • • • F3 o F2 o Fi o ^ = fiu + fiYH (55) 

Bit o /U =B2 o S3 o • • • Buj o Bio fi = bin + hyu (56) 



(53) 



(54) 



for some constants, /i^(/, /i^y, 6i^c/,and biy, which, in turn, are monomials of Ujj and Vj^i in (43 1 and 



(44 1. For example, we have for M = 4 

F^o F30 F20 Fio fl = (ni,4 + ^1,4^4,3 + 1^1,4^^4,3^^3,2 + ^^l,4^4,3W3,2'"2,l) + ('yi,4t'4,3^^3,2t^2,l) " fJ- 
B2 o B3 O B4 o Bi o fl = (ni,2 + ^^1,2^2,3 + ^'l,2t'2,3^i3,4 + t'l,2t'2,3W3,4'U4,l) + (wi,2^'2,3^^3,4W4,l) • A* 

for which 

fl,U =""1,4 + f 1,4^4,3 + Vl,4l'4,3«3,2 + Vl,4^'4,3'"3,2'"2,l 

fiy =Vi,aVa,3V3,2V2,1 

bl,U ='"1,2 + ^1,2^2,3 + Wl,2t'2,3'U3,4 + t'l,2W2,3'y3,4'U4,l 

hy ='yi,2'V2,3l'3,4l^4,l- 

Here, we can show that fiy and biy are given, respectively, by 

M M 



(57) 



i=i 



i=i 



On the other hand, using (55 1 and (56 1, (53 1 and (54 1 become 

n-l 

/i^,i^2(n) =(Fi,To)XO) = h,u ■ Yl fiy + fiy ■ Ko) 



k=0 
n-1 



/i.,i^Af (n) ={Bi,Torm = bi,u ■ Yl Kv + Kv ■ ^(0) 



(58) 
(59) 



fc=0 
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where, from the fact to be proved in the next Lemma that \fiy\ < 1 and \biy\ < 1, we have 

/fy • /i(0) -^ 0, Vly ■ n{0) ^ as n ^ oo. 



Therefore, the unique fixed point of the mean in GBP 3G is given by 



fi,u 



lim /x^,i^2(ra) -^ /i,(7 • V fiy = -. T 

n— s>oo ^ — ' ' 1 — j^ 



fc=0 

oo 



V 



lim //^i^M(n) -^ 6i,c/ • y^&iy = :, i— 

fc=0 



(60) 
(61) 



Lemma 3: \fiy\ = \ Y\j=i ^jj-i| < 1 and |6j 



v\ 



rifci ^j,j+i I < 1 for all i. 



Proof: By plugging into ( |46| ) into ( [57| ), we have for all i. 



|/» 



iy\ 



M uH T^-l 






n^''^o-i}'^^-i 



i=i 



^1 -^M ^M^A/-^{M-1} ■ ■ ■ -^2 ^2^2 -^1 ^1 



tr Ihihi K rj^^yhMh^K rj^_j_^y ■ ■ ■K2 h2h2 K^ 



{M-l} 



< 



M 



lltr(h,hf'K^l_, 



{i-i}' 



i=i 



M 



UhfK-^l-iyh, 






TT -^ |j.j-i| -J = TT _ 



M ^2 



i=i 



'i {ij-i} ^ i=i 



+ 0-V _, 



< 1 



where, (a) follows by the fact that a^b = tr{ba^) for arbitrary vectors a and b, and (6) results from 
tr(^fi) < tr(j4)tr(B) for arbitrary non-negative definite matrices A and B. Also, (c) follows by the 
following matrix inversion Lemma 



hf^iUi} = ^? l^to+i} + ^^^?) ' = ^? l^to+i} - ^to+i}^^(l + hfK7l^^M-'hfK-' 



{iJ+i} 






.i+^f^(i+i^. 



^''^{i+i}- 



For the backward recursion, \biy\ < 1 can also be proved in a similar way. 



In (58 1 and (59 1, we see that the convergence rate depends on 



M 






M ^2 



< 



Hi 



ili-i 



i=i 



+ 0- 



< 1 



ib-i 



,/^t— 1 



which is similar to the result in |16|. Note that h^ K r^-^.hi-i reflects the channel correlation between 
neighboring antennas. 
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On the other hand, the operations, Fj and Bi, are not permutable, such that Fi o Fj o fi and Fj o Fj o ^ 
may be different, and so are Fj^t ° 1^ and Fi^x ° 1^ for j / i. That is, the fixed point for each node may 
differ from one another. 



The following two Lemmas show that the fixed points in (60 1 and (61 1 are both equal to the MMSE 



estimate in (31 1. 



Lemma 4: In the forward recursion, //t^.i-^j+iC"-) is the linear MMSE estimates of a^j+i provided that 
the previous message, fJ,w,i-i^i{n), is the linear MMSE estimates of Xi. Likewise, in the backward 
recursion, /iyr j__i.j_i(n), is the linear MMSE estimates of Xi^i provided that ;U7r,^+l-s>^(f^) is the linear 
MMSE estimates of Xj. 



Htt-I. 



Proof: With c, = K hj, the linear MMSE estimate of Xj is given by h^ K y. And, hence, the 



proof is to show from ( |4T| ) and ( |42) that 

hf^^K-^y = F, o {hfK-^y) = tx,+i,, + i/,+i,, • (hfR-^y) 
h^.iK 'y = Bi o (h^K-^y) = Ui_i^i + Vi_i^i ■ {hfR-^y) 



(62) 



where n, j and z^, j are given by (45 1 and (46 1. Plugging these into the right hand side of (62i for the 



forward recursion, we finally have 



( h^R^ \ 

hi, R7\ - R-\h.-^^R-\h, y = h^.R^y. 



Similarly, for the backward recursion, we obtain 



Ui-i,i + ui^i,i ■ (hfR-^y) = h^^R ^y. 



/.. 



Lemma 5: Both the fixed points, ^^Jf ^ 



and 



bj.u 
1-6, ; 



, are equal to the MMSE estimate of Xj, i.e.. 



hfR'h. 

Proof: Without loss of generality, let us consider the first data symbol, xi. Starting from h2 R ^y 
Fi o (hi R ^y), we can successively apply the operations, F20, F30,..., Fmo to finally obtain 

Fm o Fm-1 o...oF2oFiO [h^R-^y] = Fi,t o (h^R-^y) 

= fi,u + fiy ■ (h^K-'y) 
= {h?K-'y) 
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where the first and second equaUty are obtained by the definition of Fi ^0 and (/i,c/, fiy), respectively, 
and the last equality is from Lemma 4, i.e., hi K ^y = Fm ° {hj^,iK^^y). From the last equality, we 



obtain 



1-/., 



.Hj^-l 



Hi^-l, 



hj K y and, in a similar way, we also can prove that j^^y = hj K y. 



Proof: The proof of Theorem 1 is now obvious from the above lemmas, i.e., from Lemma 2 the mean 
in the Gaussian BP over the ring-type graphical model converges to a unique fixed point and Lemma 5 



shows that the fixed point of the mean is equal to the linear MMSE estimates in (31 1. ■ 

Note that the Theorem 1 holds for any channel matrices if noise variance is not zero since, for o"^ > 0, 



the covariance matrices, Ku^jx^ in (16 1 are always invertible so that there certainly exist the MMSE 



estimator in (17i and the translation functions in (24i for all pair of {i,j). 

Since the message-update rule for the variance in (47) and ( [48] ) have the same form as in ( [4T] ) and 



(42 1, we can also prove the convergence of the variance in GBP 3G, which can be summarized by the 



following Lemma. 

Lemma 6: For an arbitrary initial value cr^(O), both the forward and backward recursions for the 



variance in (33 1 and ([35]) converge, respectively, to a unique fixed point. 



The proof is similar to that of Lemma 3. Unfortunately, however, the fixed point is not necessarily correct. 
That is, it may not equal the MMSE in ( [32] ), as also confirmed in [30]. In |29|, the convergence property 
of the BP over such ring-type graph was shown to be optimal for binary input. For Gaussian input, 
however, it is optimal only in the mean, i.e., the mean converges to the fixed point that is equal to the 



MMSE estimates, Xj in (31 1, and we cannot say so in a strict sense since the fixed point of the variance 



is not equal to the MSE in (32), the MAP estimates on the variance. 

It will also be worth comparing GBP 2G and 3G proposed in this paper and the Gaussian BP in 117], 



1 19 1, and |16|, all of which are based on the direct decomposition of Gaussian PDF, and, as noticed in 



1 19 1, are the same algorithm. The comparison can be made in several aspects, i.e., in complexity and 



convergence. In complexity, the Gaussian BP in |17|, |19|, and |16| is much simpler than GBP 2G and 
3G proposed here. Note that (1) Gaussian BP in fTf] , |[T9|, and |16| does not require preprocessing 
while GBP 2G and 3G in this paper do and (2) the complexity of the post iteration for the former is 
obviously the same as that of GBP 2G since they utilize the same graphical model, even though the post 
iteration of GBP 3G is a little bit less complex than GBP 2G. Based on this, the overall complexity of 



the proposed GBP 2G and 3G is certainly more complex than those in 1 17|, 1 19|, and 1 16|. Now, let us 
consider their convergence. Basically, GBP 3G proposed in this paper and the Gaussian BP in fTT} , fT9\ , 



and 1 16 1 results in an MMSE solution (in mean) if they converge, as proved here for GBP 3G and in 



1 16 1 for Gaussian BP with the direct decomposition. This means that, once converged, they will perform 
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the same. Unfortunately, the convergence of the Gaussian BP in |17|, |19|, and |16| seems not to be 
assured while GBP 3G surely converges. 

V. Simulation Results 

In this section, we present simulation results for the iterative algorithms with and without channel 
coding. For channel coding, we used DVB-S2 LDPC code of rates 3/4 and length 64800 [32 1. The 
performances of ML, MMSE, and the bi-diagonalization approach in |J9| are also evaluated as references. 
In the transmitter, a block (48600 bits) of random information bits are generated first and then coded 
using the LDPC encoder and then interleaved with a random interleaver and modulated into a sequence 
of 2'^-ary symbols. The symbol sequence is then divided into sub-blocks of M symbols, each of which 
is fed to a transmit antenna, where M corresponds to the number of transmit antennas. At the receiver, 
the sequence of received vectors is passed to a MIMO detector, which generates the estimates of symbol 
likelihoods and LLRs for each coded bit . The LLR is then de-interleaved and decoded by using a generic 
LDPC decodeij^Note that no 'turbo principle' is applied since it is not our focus in this paper. This means 
that the LDPC decoding begins only after the inner iteration in MIMO detector is finished. Regarding the 
MIMO channel, we generated, for each transmitted data vector, an independent and identically distributed 
(i.i.d.) MIMO channel matrix, of which each element is also an i.i.d. complex Gaussian random variable 
with mean and variance 1. The resulting channel can be regarded as a fully interleaved frequency 
selective MIMO channel that can be seen on top of the orthogonal frequency division multiplexing 
(OFDM), especially for those channels where the transmission bandwidth is much larger than the channel 
coherence bandwidth. 

Fig. |3] shows a comparison of bit error rate performance as a function of signal-to-noise ratio (SNR) 
(1/cr^) for ML, BPl in 1 18 1, MMSE, the bi-diagonaUzation approach in ^, and the proposed BP-based 



detector with the fully-connected and ring type pair-wise model. We use a 4 x 4 antenna configuration 
and QPSK modulation. We could confirm from Fig. [3] that the pair-wise MRF-based detector performs as 
well as the ML with soft decisions (i.e., using Q and Q). The SNR gap between the proposed scheme 
and the ML is shown to be around 0.1 and 0.3, respectively. 

Fig. |4] shows a comparison of bit error rate performance without channel coding as a function of signal- 
to-noise ratio (SNR) ( l/o"^) for ML, MMSE, and the proposed detector of fully-connected and ring type 

^In the transmitter and the receiver, the interleaving/de-interleaving and channel coding/decoding is used if channel coding is 
applied. 
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model. We used the same antenna configuration and modulation size, but without channel coding. As 
shown in the figure, the tendency in the relative performance looks similar to that with channel coding. 
It is also worth comparing the performance of BP2 and BP3 with the one in [27] (Table-I), where 
the pair-wise MRF obtained by the direct decomposition of Gaussian PDF is used. The performance 
comparison is shown in Fig. |5] for BPSK modulation and the same LDPC coding. As shown in the 
figure, the performance of the one in [27] is almost the same as that of BP2. We set the number of 



iterations to 4 for BP2 and BP3 and 6 for the one in |27|. We also tried to obtain the results for 4PAM 



Unfortunately, however, the algorithm in 1 27 1 failed to work for 4-PAM and this is one of the advantages 
of using the proposed scheme over the existing (fully-connected) MRF based MIMO detection^] 

Fig. [6] shows the BER performance for a 6 x 6 antenna configuration with the same modulation and 
channel coding. The SNR gap between the proposed scheme and the ML is now approximately 0.75 
dB for the fully connected pair-wise graph and 1 dB for the ring-type one, respectively. Although the 
performance degradation compared to the ML is larger than for a 4 x 4 antenna configuration, the SNR 
gain over the MMSE detector is around 3.5 dB. 

In Fig. [7} BER performance with higher modulation order (16-QAM) is shown. We used a 4 x 4 
antenna configuration and the same channel coding. Here, the SNR gap between the proposed method 
and the ML is shown to be around 1 dB for the BP2 over the fully connected pair-wise graph and 0.7 
dB for BP3 over the ring-type, respectively. Note that the performance of BP2 over the fully connected 
pair-wise graph is now worse than that of BP3 over the ring-type. Here, we set the number of iterations 
of BP2 and BP3 to four and six. One possible reason for why the fully-connected graph perform worse 
than the ring-type for higher order QAM can be inferred from the convergence behavior as shown in 
Fig.9, where it is shown that the convergence for fully-connected pair-wise graph is stuck at three or four 
iterations and the BER is increased sharply with more number of iterations, while, for the ring-type, it 
converges steadily. 

In Figs.|3]to|7] the number of iterations was set based on the simulation results in Figs. [8] and |9] which 
we performed with different number of antennas and modulation size, to give insights into how many 
iterations are required for a satisfactory performance. As shown in the simulation results, the number 
of iterations required for convergence depends on the modulation sizes, but not much on the number 

^ The reason we consider here only one-dimensional constellation like BPSK or 4-PAM is that the algorithm in |27| is 
applicable only to those real constellation and we just wanted to use the algorithm as is since any modification may cause 
unexpected results. 
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of antennas. Specifically, for BP3, we can say that we need more number of iterations for the eventual 
convergence with higher modulation size. For BP2, the convergence behavior with different number of 
antennas looks similar to that of BP3 (specifically for QPSK), while it is quite different from that of BP3 
with different modulation sizes. Specifically speaking, the performance of BP2 over the fully connected 
graph does not get better with more than 3 or 4 iterations. Rather, it is degraded especially for higher 
order modulation. In BP3 over the ring-type graph, however, no degradation has been observed with 
more iterations. As mentioned previously, the condition for sure convergence in loopy graph is still an 
open problem. And the difference in the convergence behavior of BP2 and BP3 can only be explained 



by the note in |21 1, i.e., in densely connected graph, the messages may circulate along the short loops 
preventing the eventual convergence. Fig. |2ja) of the fully connected pair-wise graph is more densely 
connected than Fig. [2jb) of the ring-type pair-wise graph. Although the message will propagate faster in 
densely connected graph than in sparsely connected graph, resulting in faster convergence, the message 
circulation may prevents the eventual convergence with more iterations. 

Another point we need to note is that, in BP3, one can allow a slight performance degradation for a 
large computational saving. Certainly, as shown in Fig. [9] and implicated in Fig. [TJ at least 10 iterations 
is needed for eventual convergence for 16QAM. However, comparing the required SNR for, say, 10~^ 
BER, the difference between 6 and 12 iterations is less than 0.1 dB while, in computational burden, 12 
iterations is twice that of 6. 



Fig. 10 shows the convergence behavior of the Gaussian BP discussed in Section IV. We plotted the 
bit error rate performance of the Gaussian BP over the fully-connected and ring-type pair-wise graph, 
respectively, with various numbers of iterations. As can be seen in the figure, both GBP 2G and 3G 
converge to the performance of linear MMSE detector, though it requires many more iterations than 
those of BP2 and BP3. The only difference between GBP 2G and 3G is the rate of convergence. On the 
other hand, in the high SNR region, the performance appears to worsen with higher SNR. However, it 
should be noted that with higher SNR eventual convergence simply requires more iterations. 

VI. Conclusions 

In this paper, low complexity, iterative MIMO detection algorithms were derived as a message passing 
over the pair-wise bipartite graphs with the translation functions that are obtained by marginalizing the 
posterior joint probability density under the Gaussian input assumption. We investigated two models, the 
fully-connected and ring-type pair-wise graph. The latter is shown to be an extension of the previous 
work in |[9|, 1 10|. The two pair-wise graphical models are rather sparse in the sense that the number of 
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edges connected to an observation node, i.e., edge degree, is only two and, thus, the message passing 
becomes much easier than that over the fully connected bipartite graph. 

We also investigated the proposed algorithm under Gaussian input assumption. It was shown that, for 
the Gaussian BP over the ring-type pair-wise graph, the mean converges to the linear MMSE estimates, 
even though the variance converges to a different value from the MMSE obtained by MAP estimation. 



These results are in line with those in |16|, |17|, |30|, |31|. Gaussian BP over the fully-connected 
pair-wise graph shows a faster convergence rate than Gaussian BP over the ring-type graph. 

As proved in this paper, the convergence of the Gaussian BP 3G over the ring-type graph is guaranteed. 
This does not, however, appear to be the case for non-Gaussian message. The performance of BP 2 for 
non-Gaussian case degrade with more than four iterations. This phenomenon might stem from the short 
cycles in their graphical model and may be avoided by utilizing "global iteration" between MIMO 
detection and channel decoding. That is, by employing an appropriate channel code and interleaver, 
message circulation along local short cycles can be broken up not only for steady convergence but also 
for better performance. We leave this for our future work. 

Appendix 
DETAILED DERIVATIONS OF (44) AND THE GAUSSIAN BP 



To derive ( [24] ) and the Gaussian BP rule, ([33])-(|40|), we use the properties of the Gaussian PDF in |21|, 



some of which are as follows 

1) CM (x; //, cr^) = CM [fi; x, a^) = CM {x - fi; 0, ct^) = CM {fi - x; 0, ct^) 



2) CM (ax + b; fi, a^) = CM ( x; ^^, ^) 
' \ a \a\ J 



-2.. , _-2. 



3) CM {x;iJ,i,aj) ■ CM {x; fi2,crl) = CM ( x; ^ _l 32^,^ =2 ) ■ CM{fJ.i; ^2,(^1 + crj) 



4) / CM (x; fii,crl) ■ CM (x; ^2, crl) • dx = CM (/xi; //2, f^i + (^i) ■ 

Using these, (24i is obtained by direct computation as follows. 

CM (y^.| .; aj\i^jX,j + a^ii^jXi, cr|| J • CM (x^; 0, 1) 
P{xj\xi,y'j\,) = — 



CM(y'.^r,aj\i^iXi,a'^.^. + \aj\ij\ 
C-^ (^j; ^ {y'j\i - '^j\^,iXi) > ^^) • CM (xj; 0, 1) 



CAA(y^.|.;a^-|i,iXi,a||. + |oj|ij|2 
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'j\i 



^j\i,i-^i) , 2 lU... .12 



(63) 



Now, we derive the message update rule of the Gaussian BP. To this end, we divide the message update rule 
in BP2, (26 1, into two steps, i.e., the extrinsic information computation, Aj_^j(xj) = nfcev'(i)\7 ^fc^i {xi), 
and the message translation step, ■Ki^j{xj) = aYj^^^^p{xj\xi,y)\i^j{xi). Assuming the Gaussian 
messages, -Kk^iixi) = CM{xi; fj,^ j^^^, al ^^J, the former is given by 



Xi^jixi)= Yl ^k^iixi)= n CM{xi;n^i,^i,al^f.^i) 
kev{i)\j k&v{i)\j 



oc 



CAT 



Xi , 



E-2 



E 



^-2 

keV{i)\j "■K,k-^i 



■IE' 

.kevii) 



-2 

-7r,fc— >i 



CAA(xi;/i;,i_,j.,afi^j.) (64) 



For the message translation, we first rewrite ( 63 1 as 



p{Xj 



^i^Vjli 



CM X,-; 



ji -I- ■ 



'Vi. + |a,-|i,,|2 



CM 



/ -I ■ 



'^||i + l«i|ijl^ 
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I 191 19 



(65) 
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Then, by plugging ( 64 1 and ( 65 1 into ( 26 1 and changing the summation into integral|j we have 

T^i-^j\Xj) = I P \Xj\Xi,yjij • \i^^j\Xi) • dXi 



L 



CM 



^i'l 



'^j\i,i 






a. 



J\1;J 



J I ' I 19 1 19 



CM {xi; fix,i^j,crx^i^j) ■ dxi 



(^Au + a-,- 






4(4 + i"iiMi') 



a.|. . 



-^il*,il 



CAT 



X-j . 



"•iKJ 



J' 9 



Ujli (lj\i,ilJ'\,i^j j 1 2 



(T 



i|j 






+ 



I |2| |2 2 



%-^j 



LA/ yxj] jj,^^i^.j, (T^^^^j J 



'^J|i + l«ilMf 



(66) 



(67) 



By comparing the mean and variance in (66 1 and (67 1, we obtain the message passing rules of (33 1 and 



341), respectively. The belief in (35 1 and (36 1 can be obtained similarly to the derivation in (64 1. 
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TABLE I 
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Fig. 1. Bipartite graphs for a 4 x 4 MIMO cliannel. Tlie circles are variable nodes corresponding to a data symbol and the 
boxes labeled by y and yj are observation nodes corresponding to the received signal. 
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Fig. 2. The bipartite graph for (a) fully-connected pair-wise model and (b) ring-type pair-wise model, respectively, for a 4 x A'^ 
MIMO channel. 
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Fig. 3. A comparison of bit error rate performance of MMSE, MAP and the proposed detectors as a function of SNR (1/cr^) 
4x4 antenna configuration, QPSK modulation with DVB-S2 LDPC code of rate 3/4 (length 64800). 
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Fig. 4. A comparison of bit error rate performance of MMSE, MAP, and tfie proposed detectors as a function of SNR (l/f^) 
4x4 antenna configuration, QPSK modulation, no channel coding. 
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Fig. 5. A comparison of bit error rate performances of MMSE, the proposed detectors and the detector in L25J with a damping 
factor 0.45. 
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Fig. 6. A comparison of bit error rate performance of MMSE, MAP and the proposed detectors as a function of SNR (l/cr^); 
6x6 antenna configuration, QPSK modulation with DVB-S2 LDPC code of rate 3/4 (length 64800). 



MMSE 
^ MRF-Ring, N.^^^ = 

^ — MRF-Ring, N.^ 



6 

12 



MRF-Full Connection, N„ =4 

' Iter 



10.5 11 

SNR [dB] 




12.5 



Fig. 7. A comparison of bit error rate performance of MMSE, MAP, and the proposed detectors as a function of SNR (l/f^) 
4x4 antenna configuration, 16QAM modulation with DVB-S2 LDPC code of rate 3/4 (length 64800). 
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Fig. 8. Convergence property of the proposed algorithm with the number of antennas M = 4, 6 and 8; QPSK modulation, 
DVB-S2 LDPC code of rate 3/4 (length 64800). 
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Fig. 9. Convergence property of the proposed algorithm with modulation size of 2™ for m = 2, 3, 4 and 5; 4 x 4 antenna 
configuration, DVB-S2 LDPC code of rate 3/4 (length 64800) 
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Fig. 10. Bit error rate performance of the Gaussian BP over the fully-connected and ring-type pair-wise graph, respectively; 
4x4 antenna configuration, QPSK modulation, no channel coding. 
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